Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity

نویسندگان

  • Quynh C Nguyen
  • Dapeng Li
  • Hsien-Wen Meng
  • Suraj Kath
  • Elaine Nsoesie
  • Feifei Li
  • Ming Wen
چکیده

BACKGROUND Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data represents a possible new data resource for neighborhood research. OBJECTIVE The aim of this study was to build, from geotagged Twitter data, a national neighborhood database with area-level indicators of well-being and health behaviors. METHODS We utilized Twitter's streaming application programming interface to continuously collect a random 1% subset of publicly available geolocated tweets for 1 year (April 2015 to March 2016). We collected 80 million geotagged tweets from 603,363 unique Twitter users across the contiguous United States. We validated our machine learning algorithms for constructing indicators of happiness, food, and physical activity by comparing predicted values to those generated by human labelers. Geotagged tweets were spatially mapped to the 2010 census tract and zip code areas they fall within, which enabled further assessment of the associations between Twitter-derived neighborhood variables and neighborhood demographic, economic, business, and health characteristics. RESULTS Machine labeled and manually labeled tweets had a high level of accuracy: 78% for happiness, 83% for food, and 85% for physical activity for dichotomized labels with the F scores 0.54, 0.86, and 0.90, respectively. About 20% of tweets were classified as happy. Relatively few terms (less than 25) were necessary to characterize the majority of tweets on food and physical activity. Data from over 70,000 census tracts from the United States suggest that census tract factors like percentage African American and economic disadvantage were associated with lower census tract happiness. Urbanicity was related to higher frequency of fast food tweets. Greater numbers of fast food restaurants predicted higher frequency of fast food mentions. Surprisingly, fitness centers and nature parks were only modestly associated with higher frequency of physical activity tweets. Greater state-level happiness, positivity toward physical activity, and positivity toward healthy foods, assessed via tweets, were associated with lower all-cause mortality and prevalence of chronic conditions such as obesity and diabetes and lower physical inactivity and smoking, controlling for state median income, median age, and percentage white non-Hispanic. CONCLUSIONS Machine learning algorithms can be built with relatively high accuracy to characterize sentiment, food, and physical activity mentions on social media. Such data can be utilized to construct neighborhood indicators consistently and cost effectively. Access to neighborhood data, in turn, can be leveraged to better understand neighborhood effects and address social determinants of health. We found that neighborhoods with social and economic disadvantage, high urbanicity, and more fast food restaurants may exhibit lower happiness and fewer healthy behaviors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity.

OBJECTIVES Using publicly available, geotagged Twitter data, we created neighborhood indicators for happiness, food and physical activity for three large counties: Salt Lake, San Francisco and New York. METHODS We utilize 2.8 million tweets collected between February-August 2015 in our analysis. Geo-coordinates of where tweets were sent allow us to spatially join them to 2010 census tract loc...

متن کامل

Geotagged US Tweets as Predictors of County-Level Health Outcomes, 2015-2016.

OBJECTIVES To leverage geotagged Twitter data to create national indicators of the social environment, with small-area indicators of prevalent sentiment and social modeling of health behaviors, and to test associations with county-level health outcomes, while controlling for demographic characteristics. METHODS We used Twitter's streaming application programming interface to continuously coll...

متن کامل

Compensating the gaps caused by aging: Analyzing the main themes of physical activity in the neighborhood environment from the perspective of the older adults(A phenomenological approach)

Introduction: Physical activity and aerobic exercises have anti-anxiety and anti-depressant effects in compensating for the problems and gaps caused by psychological issues among the older adult. Epidemiological studies show that physical activity is related to better mental health and flexibility against mental distress, such as depression and anxiety symptoms. Therefore, the purpose of this r...

متن کامل

Understanding Human Mobility from Twitter

Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far used private and low resolution data, such as call data records. Here, we propose Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to ...

متن کامل

The recognition of the necessity of for community-based disaster risk management to reduce the risk of vulnerability to earthquake disaster (case study: YousefAbad neighborhood of Tehran)

Disaster management and current attitudes in this area only focus on this areachr('39')s physical vulnerabilities, raising urban residentschr('39') exposure to these challenges in front of the earthquake. On the other hand, Incidental actions include reducing the vulnerability and the physical strengthening and promotion of poor organization during the disaster; they ignored the capabilities an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2016